Gina Valdivia, Luna Xu, Isabella Seo, and Owen Handelman
Published
December 9, 2024
Abstract
This report examines disparities in community program accessibility across socioeconomic levels, focusing on program type, seasonality, and delivery format. The findings reveal significant inequities. Low-SES neighborhoods have fewer programs and scholarships, despite leading in free food offerings. Mid-SES areas lack free and paid opportunities, while High-SES neighborhoods have more scholarships and transport support. Programs are more frequent in warmer months, with similar offerings year-round for all age groups. Online programs are more likely to offer financial benefits, while distance to train stations has no impact on in-person program benefits. We recommend expanding programs and scholarships in Low-SES neighborhoods, encouraging more events and diverse offerings. More digital media programs for different age groups, along with more year-round options and a balanced distribution of free programs, are needed. Increasing online programs would better support economically disadvantaged individuals. Exploring easy transportation options (e.g., bus/train passes) would further improve participation. Finally, targeted investment in High-Hardship Index areas with community outreach is crucial to bridge accessibility gaps and promote equity across all communities.
1 Problem statement
This study explores how socioeconomic, environmental, and logistical factors shape the accessibility and equity of community programs. It investigates the influence of community socioeconomic status (SES) on program availability, the evolution of equity-focused features (such as scholarships, free food, and transportation) in low-SES areas, the effect of climate and seasonal timing on program types and costs for various age groups, and how program location (online vs. in-person) and accessibility via public transportation impact support for participant access. We aim to gain insight on program distribution across the Chicago community by identifying where, when, and what kinds, of programs are currently offered for different communities. This would aid the My CHI. My Future. team collaborate more effectively with partners, prioritize community needs, and work toward their mission of connecting all Chicago youth with meaningful out-of-school opportunities. It is imperative we ensure a more equitable and accessible range of opportunities for youth across the city.
2 Data sources
My Chi. My Future This dataset provides comprehensive information on all programs under the My Chi. My Future initiative, including key details such as accessibility features, start/end dates, and location. By analyzing various aspects of this dataset, we aim to uncover trends and correlations that address the primary research questions. The version of this dataset used for analysis was downloaded on November 13, 2024.
Chicago Temperature This dataset contains monthly average temperature records for Chicago from 2000 to 2024. It will be used to evaluate the influence of temperature and seasonality on the types and pricing of community programs, particularly focusing on how weather conditions affect program offerings across different age groups (Research Question 3).
Chicago Census Data The hardship index data provides insights into socioeconomic conditions across various Chicago communities, which is crucial for analyzing the relationship between community socioeconomic status and program accessibility (Research Questions 1 and 2).
CTA Info This dataset includes extensive information on Chicago Transit Authority station and bus stop locations. It will be utilized to examine the proximity of community programs to public transportation, helping to understand the effects of accessibility on program participation and the support provided (Research Question 4).
Chicago Boundaries The boundaries dataset provides geographical data on Chicago’s neighborhoods, which allows for categorizing the different programs in the My Chi. My Future dataset by community area. This categorization is essential for analysis related to community socioeconomic factors and their impact on accessibility (Research Questions 1 and 2).
3 Data quality check / cleaning / preparation
In a tabular form, show the distribution of values of each variable used in the analysis - for both categorical and continuous variables. Distribution of a categorical variable must include the number of missing values, the number of unique values, the frequency of all its levels. If a categorical variable has too many levels, you may just include the counts of the top 3-5 levels.
Were there any potentially incorrect values of variables that required cleaning? If yes, how did you clean them?
Did your analysis require any other kind of data preparation before it was ready to use?
Code
import pandas as pdimport numpy as npimport seaborn as snsimport matplotlib.pyplot as pltfrom shapely.geometry import Pointfrom shapely import wktimport geopandas as gpdfrom tabulate import tabulate# We all called the main MCMF dataset something different in our explorations.mcmf_data = pd.read_csv('My_CHI._My_Future._Programs_20241113.csv') # distribution of variablesmy_chi_df = pd.read_csv('My_CHI._My_Future._Programs_20241113.csv') # Q1 exploration (Gina)project_data = pd.read_csv('My_CHI._My_Future._Programs_20241113.csv') # Q2 exploration (Luna)mcmf_project_data = pd.read_csv('My_CHI._My_Future._Programs_20241113.csv') # Q3 exploration (Isabella)project_data_full = pd.read_csv('My_CHI._My_Future._Programs_20241113.csv') # Q4 exploration (Owen)chi_nei = pd.read_csv('CommAreas_20241114.csv') # Q1, Q2comm_areas_df = pd.read_csv('CommAreas_20241114.csv') #Q2income_df = pd.read_csv('Census_Data.csv') #Q2temp_data = pd.read_csv('chicago_monthly_temp_avg.csv') #Q3bus_data_full = pd.read_csv('CTA_BusStops_20241118.csv') #Q4train_data_full = pd.read_csv('CTA_-_System_Information_-_List_of__L__Stops_20241118.csv') #Q4
Below is the distribution of data that was used from the MCMF dataset, which was used by all members of the group. The data has not yet been manipulated (unless stated in the description). MCMF Dataset:
# Filter to only the relevant columns used in the analysisrelevant_columns = ['COMMUNITY AREA NAME', 'HARDSHIP INDEX']census_data_filtered = census_data_df[relevant_columns]# Function to summarize categorical variablesdef summarize_categorical(data, column): summary = {} summary['Variable'] = column summary['Type'] ='Categorical' summary['Missing Values'] = data[column].isna().sum() summary['Unique Values'] = data[column].nunique()if summary['Unique Values'] >5: # Include top 5 levels if too many summary['Top Levels'] = data[column].value_counts().nlargest(5).to_dict()else: # Include all levels if few summary['Top Levels'] = data[column].value_counts().to_dict()return summary# Function to summarize continuous variablesdef summarize_continuous(data, column): summary = {} summary['Variable'] = column summary['Type'] ='Continuous' summary['Missing Values'] = data[column].isna().sum() summary['Mean'] = data[column].mean() summary['Standard Deviation'] = data[column].std() summary['Min'] = data[column].min() summary['Max'] = data[column].max() summary['Median'] = data[column].median()return summary# Summarize relevant variablessummaries = []for column in census_data_filtered.columns:if census_data_filtered[column].dtype =='object'or census_data_filtered[column].nunique() <20: # Treat as categorical summaries.append(summarize_categorical(census_data_filtered, column))else: # Treat as continuous summaries.append(summarize_continuous(census_data_filtered, column))# Convert summaries into a DataFrame for tabular displaysummary_df = pd.DataFrame(summaries)print(tabulate(summary_df, headers='keys', tablefmt='grid'))
+----+---------------------+-------------+------------------+-----------------+--------------------------------------------------------------------------------------------+----------+----------------------+-------+-------+----------+
| | Variable | Type | Missing Values | Unique Values | Top Levels | Mean | Standard Deviation | Min | Max | Median |
+====+=====================+=============+==================+=================+============================================================================================+==========+======================+=======+=======+==========+
| 0 | COMMUNITY AREA NAME | Categorical | 0 | 78 | {'Rogers Park': 1, 'Pullman': 1, 'Archer Heights': 1, 'Garfield Ridge': 1, 'Hegewisch': 1} | nan | nan | nan | nan | nan |
+----+---------------------+-------------+------------------+-----------------+--------------------------------------------------------------------------------------------+----------+----------------------+-------+-------+----------+
| 1 | HARDSHIP INDEX | Continuous | 1 | nan | nan | 49.5065 | 28.6906 | 1 | 98 | 50 |
+----+---------------------+-------------+------------------+-----------------+--------------------------------------------------------------------------------------------+----------+----------------------+-------+-------+----------+
Temperature Dataset:
Code
temp_data.describe()
Year
Jan
Feb
Mar
Apr
May
Jun
Jul
Aug
Sep
Oct
Nov
Dec
Annual
count
24.000000
24.00000
24.000000
24.000000
24.000000
24.000000
24.000000
24.000000
24.000000
24.000000
24.000000
24.000000
24.000000
24.000000
mean
2011.500000
24.97500
27.150000
38.858333
49.408333
60.175000
70.266667
74.862500
73.645833
66.558333
53.783333
41.237500
30.029167
50.912500
std
7.071068
5.08726
5.690266
4.881724
2.888044
2.868153
2.586531
2.900347
2.144960
2.641955
3.007623
4.040858
5.902356
1.789902
min
2000.000000
15.70000
14.600000
31.700000
41.200000
55.200000
65.500000
69.400000
67.500000
61.900000
48.800000
33.600000
16.000000
47.500000
25%
2005.750000
21.22500
25.325000
34.975000
47.600000
57.975000
67.750000
72.975000
72.850000
64.550000
51.725000
38.875000
26.125000
49.750000
50%
2011.500000
24.55000
27.400000
38.550000
49.400000
60.000000
70.850000
74.950000
73.550000
66.850000
53.500000
41.700000
30.850000
50.850000
75%
2017.250000
28.90000
30.825000
41.900000
51.575000
61.775000
71.750000
76.625000
74.500000
69.100000
56.025000
43.925000
33.850000
51.925000
max
2023.000000
35.80000
38.000000
53.500000
54.600000
66.100000
74.300000
81.000000
77.100000
70.300000
59.700000
48.200000
39.000000
54.500000
After inspecting the “Geographic Cluster Name” column in the MCMF dataset, we found that there are 121,359 missing values for in-person programs. Since both analysis 1 and 2 relies heavily on information about the neighborhood programs belong to, we decided to use the geographic information in Community Boundaries dataset and latitude longtitude information in the MCMF dataset to map programs into respective neighborhoods. We accomplished this through two different methods. Our first method was used in Analysis 1 and our second was used in Analysis 2.
Our first method employs geospatial methods to impute missing values in the ‘Geographic Cluster Name’ column by utilizing latitude and longitude coordinates in the original dataset along with geographic community boundaries from a secondary MCMF dataset. To prepare for this, the community areas dataset was converted into a GeoDataFrame by interpreting the ‘the_geom’ column, which contains geographic boundary data. This allowed the polygons representing community areas to be used for spatial operations. Next, the program dataset (my_chi_df) was filtered to remove rows with missing latitude or longitude values, as these coordinates are essential for the imputation process. Valid latitude and longitude values were then converted into geometric Point objects, and the filtered data was transformed into a GeoDataFrame to enable spatial operations. Before conducting the spatial join, the ‘Geographic Cluster Name’ and ‘COMMUNITY’ columns in the datasets were standardized by converting text to uppercase and stripping whitespace to ensure consistency in comparisons. A spatial join was then performed between the program points and the community area polygons. This operation assigned each point to the community area polygon in which it falls. For programs with multiple locations, the most common community name (mode) for each program was determined by grouping the data by ‘Program Name’. This mapping of program names to community names was then applied back to the original dataset to fill in missing values in the ‘Geographic Cluster Name’ column.
In our second method, we compared the neighborhood names in the MCMF dataset with neighborhood names in the Community Boundaries dataset to see if there are any difference. We found that aside from neighborhood names, some programs in the MCMF dataset used unstandardized names such as “Far South Equity Zone” and “Back of the Yards”, which also need to be mapped. After extracting programs that has both longitude and latittude information and don’t have a geographic cluster name or its geographic cluster name is unstandardized, we turned longitude lattitude information into shapely library point format. We also turned the multipolygon in Community Boundaries dataset into shapely format. Next, for each longitude-latitude pair, we checked if it is in any of the multipolygon that represents a neighborhood. After mapping, we reviewed the neighborhoods assigned to programs with unstandardized names. This step was necessary because some programs with unstandardized names lack latitude-longitude data, and we wanted to map them to the same neighborhoods as others with the same unstandardized name. However, upon review, we found that many unstandardized names, such as equity zones, were mapped to different neighborhood names. To avoid inconsistencies—where some equity zones are converted into neighborhood names while others remain unchanged— we decided to create a new column, “Neighborhood,” dedicated exclusively to neighborhood names. Programs in equity zones that could not be mapped to a specific neighborhood will be marked as “NA” in this column.
Other forms of impution and preparing the data will be discussed in individual analyses as they only pertain to a single analysis.
4 Exploratory Data Analysis
For each analysis:
What did you do exactly? How did you solve the problem? Why did you think it would be successful?
What problems did you anticipate? What problems did you encounter? Did the very first thing you tried work?
Mention any code repositories (with citations) or other sources that you used, and specifically what changes you made to them for your project.
Note that you can write code to publish the results of the code, but hide the code using the yaml setting #|echo: false. For example, the code below makes a plot, but the code itself is not published with Quarto in the report.
4.1 Analysis 1: How Does Community Socioeconomic Status Relate to the Accessibility of In-Person Programs?
By Gina Valdivia
The goal of this analysis is to understand the relationship between community socioeconomic status (SES) and the accessibility of in-person programs in different neighborhoods. By analyzing the availability, affordability, and accessibility features of these programs, we aim to identify inequalities across different SES groups. The SES categories are determined using the Hardship Index and are divided into three bins: Low-SES, Mid-SES, and High-SES. Since the hardship index is a multidimensional measure of a community’s socioeconomic conditions, we decided it was an effective way to assess a community’s wellbeing. Communities with higher hardship index scores have worse economic and social conditions than communities with lower scores. Our analysis addresses how the density of programs vary across SES levels, whether there are differences in program types or costs among SES categories, and whether low-SES communities are receiving support features like scholarships, free transportation, or free food. We answer these questions using various visualizations that capture program density, availability of support features, and overall accessibility. We created bar plots and scatter plots to visualize program attributes across SES levels. We also used a line of best fit to understand the correlation between hardship and program count.
In regards to data preparation, we first filtered out the dataset to include only face-to-face programs, as we wanted to assess the accessibility of in-person programs since it was based on the location of the program. We also created Socioeconomic Status (SES) bins using the Hardship Index to categorize neighborhoods into High-SES, Mid-SES, and Low-SES using data from Chicago Census. Finally, merged program data with socioeconomic data to obtain the full set of information for each community, ensuring that each program had corresponding socioeconomic metrics.
We encountered a lot of missing data in the Geographic Cluster Name column from the My Chi. My Future data, making it difficult to categorize them geographically to merge with socioeconomic information of the region. We resolved this using geospatial imputation with Chicago Boundaries data. We utilized GeoPandas to do a spatial join with the latitude and longitude coordinates to match them with the appropriate community area polygon. We also encountered issues with the geographic cluster names being inconsistent, making it difficult to merge, so we converted them to uppercase and removed any extra whitespace.
To assess the impact of community hardship on the number of available programs, we created a scatter plot with a line of best fit between Program Count and the Hardship Index. The Pearson correlation coefficient (r) was -0.240 and P-value was 3.928e-02 for this graph. This p value is under 0.05, and thus indicates that the correlation was statistically significant.
There are a few outliers, which influence the slope of the line of best fit, but the narrow confidence interval suggests that the model is able to explain the data quite precisely for the majority of the points and the outliers’ influence is limited. We notice that there is a negative correlation between Hardship Index and Program Count. As hardship increases, the number of programs decreases. The line of best fit shows a clear downward trend, suggesting that higher hardship areas have significantly fewer programs.
To further investigate whether the correlations shown in regards to availability of programs, we plotted the Average Program Count by Economic Status using SES bins. When using pd.qcut(), we handled duplicate bin edges by adding duplicates=‘drop’, allowing us to divide data into quantiles without non-unique bins.
/opt/anaconda3/lib/python3.11/site-packages/seaborn/categorical.py:641: FutureWarning: The default of observed=False is deprecated and will be changed to True in a future version of pandas. Pass observed=False to retain current behavior or observed=True to adopt the future default and silence this warning.
grouped_vals = vals.groupby(grouper)
When examining the bar plot, high-SES areas have the most programs, while Low-SES areas have the fewest. This furthers the previous observation that lower socioeconomic areas have fewer in-person opportunities compared to higher socioeconomic areas, indicating a potential lack of resources in underserved communities.
We suggest that My Chi My Future allocates more resources to increase the number of in-person programs in Low-SES areas, focusing on outreach to ensure underprivileged communities are aware of the programs available.
In addition to the overall count of programs offered to communities based on socioeconomic status, we analyzed the types of programs available by plotting Program Category against SES bins.
/opt/anaconda3/lib/python3.11/site-packages/seaborn/categorical.py:641: FutureWarning: The default of observed=False is deprecated and will be changed to True in a future version of pandas. Pass observed=False to retain current behavior or observed=True to adopt the future default and silence this warning.
grouped_vals = vals.groupby(grouper)
/opt/anaconda3/lib/python3.11/site-packages/seaborn/categorical.py:641: FutureWarning: The default of observed=False is deprecated and will be changed to True in a future version of pandas. Pass observed=False to retain current behavior or observed=True to adopt the future default and silence this warning.
grouped_vals = vals.groupby(grouper)
We noticed that Sports and Wellness programs dominate each SES category. However, Low-SES areas have significantly fewer Sports and Wellness programs compared to the higher SES areas. This discrepancy suggests potential inequalities in the types of opportunities available to different communities, especially in areas that may need them the most for well-being.
We also examined program acessibility by evaluating factors such as cost, availability of financial support(scholarships), whether participants were compensated, access to transportation services, and free food. We observed the affordability of programs by categorizing them by price, availability of financial support, paid participation, access to transportation services, and availability of free food and plotting the count by SES bins.
/opt/anaconda3/lib/python3.11/site-packages/seaborn/categorical.py:641: FutureWarning: The default of observed=False is deprecated and will be changed to True in a future version of pandas. Pass observed=False to retain current behavior or observed=True to adopt the future default and silence this warning.
grouped_vals = vals.groupby(grouper)
/opt/anaconda3/lib/python3.11/site-packages/seaborn/categorical.py:641: FutureWarning: The default of observed=False is deprecated and will be changed to True in a future version of pandas. Pass observed=False to retain current behavior or observed=True to adopt the future default and silence this warning.
grouped_vals = vals.groupby(grouper)
/opt/anaconda3/lib/python3.11/site-packages/seaborn/categorical.py:641: FutureWarning: The default of observed=False is deprecated and will be changed to True in a future version of pandas. Pass observed=False to retain current behavior or observed=True to adopt the future default and silence this warning.
grouped_vals = vals.groupby(grouper)
/opt/anaconda3/lib/python3.11/site-packages/seaborn/categorical.py:641: FutureWarning: The default of observed=False is deprecated and will be changed to True in a future version of pandas. Pass observed=False to retain current behavior or observed=True to adopt the future default and silence this warning.
grouped_vals = vals.groupby(grouper)
/opt/anaconda3/lib/python3.11/site-packages/seaborn/categorical.py:641: FutureWarning: The default of observed=False is deprecated and will be changed to True in a future version of pandas. Pass observed=False to retain current behavior or observed=True to adopt the future default and silence this warning.
grouped_vals = vals.groupby(grouper)
/opt/anaconda3/lib/python3.11/site-packages/seaborn/categorical.py:641: FutureWarning: The default of observed=False is deprecated and will be changed to True in a future version of pandas. Pass observed=False to retain current behavior or observed=True to adopt the future default and silence this warning.
grouped_vals = vals.groupby(grouper)
The analysis of program accessibility across different socioeconomic status (SES) levels revealed several important trends and disparities. Free programs were found to be abundant in both Low-SES and High-SES neighborhoods, whereas Mid-SES neighborhoods had notably fewer free programs. This suggests a gap in accessibility for mid-level income communities, possibly indicating that these neighborhoods fall between eligibility for certain assistance programs while not having enough disposable income to afford other options. Regarding programs with scholarships and transportation support, High-SES areas had the most available, followed by Mid-SES and then Low-SES neighborhoods. This disparity suggests that Low-SES neighborhoods, which would benefit most from such support, are underserved, revealing a critical inequity in the distribution of program benefits.
Further analysis showed that programs where participants are paid were most common in High-SES areas, with Low-SES areas also having a reasonable number of such programs, whereas Mid-SES neighborhoods had the lowest number. This indicates that Mid-SES areas are lacking in paid program opportunities, highlighting a gap in incentivized participation in these communities. Additionally, the analysis of programs with free food showed the highest availability in Low-SES neighborhoods, suggesting targeted efforts to address food insecurity in areas most in need. However, both Mid-SES and High-SES areas had significantly fewer programs with free food, suggesting that the broader issue of food insecurity across all SES levels may not be adequately addressed.
4.2 Analysis 2: How has the availability of equity-focused features among programs changed over time based on neighborhood Socioeconomic status?
By Luna Xu
To explore this question, I began by reviewing all columns in the “My Chi My Future” (MCMF) dataset to identify those relevant to equity-focused measures. I identified “Scholarship Available,” “Participants Paid,” “Transport Provided,” and “Has Free Food” as key equity-related features. Since Analysis 4 will specifically focus on transportation, I narrowed the scope of equity-focused features to: “Scholarship Available,” “Participants Paid,” and “Has Free Food.”
To understand neighborhood socioeconomic status(SES), I examined the Census dataset. Since the hardship index incorporates six selected socioeconomic indicators, I decided to base my analysis on it, as it provides the most holisitic view. The hardship index functions like a ranking, with each neighborhood having a unique score, where 99 represents the highest level of hardship and 1 the lowest. Therefore, I decided to bin neighborhood into three equal buckets: low-SES, mid-SES, and high-SES. I also dropped the row “chicago” which is a total measure.
After merging the Census dataset with the SES bins and the MCMF dataset on the neighborhood variable, I explored the overall program distributions by neighborhood SES. Specifically, I counted the number of distinct programs over the years in each neighborhood and visualized them on the Chicago map using GeoPandas and Matplotlib’s pyplot. Noticing several extreme values (e.g., one neighborhood had only 6 programs), I applied the LogNorm function from Matplotlib’s colors module for better visualization, using the minimum and maximum program counts across all neighborhoods.
As seen from above graphs, we can see that while there’s less obvious contrasts between the three SES bins, low-SES buckets generally have more neighborhoods with less programs, including neighborhood like Bunrside that only has 6 programs over 5 years period.
To further explore the program distributions among the three SES buckets, I plotted a line graph to show the total number of distinct programs in the three SES bins in each years. Since not all programs in 2025 are inputted into the dataset, I excluded the 2025 data.
/var/folders/d0/lnkwf311481bj1nhh21x9k3m0000gn/T/ipykernel_3244/2903163378.py:7: FutureWarning: The default of observed=False is deprecated and will be changed to True in a future version of pandas. Pass observed=False to retain current behavior or observed=True to adopt the future default and silence this warning.
year_nei_programcount=project_by_year_by_ses.groupby(['Start Year','ses_bin'])['Program ID'].nunique().reset_index()
Now, we can clearly see that, in general, the number of distinct programs increases across all three SES buckets over the years, with high-SES neighborhoods showing the highest rate of increase. From 2022 to 2024, the total number of programs in low-SES neighborhoods is gradually approaching that of mid-SES neighborhoods. This suggests that, while a gap still exists in the number of programs offered across SES buckets, efforts are being made to bridge the equity gap in program availability.
I plotted a boxplot to further examine the distribution of programs among Neighborhood within each SES buckets. As shown below, high-SES bucket generally have more scattered distribution especially in year 2023 and 2024. Similarly, mid-SES and high-SES made up of most outliers, indicating that certain mid- or high-SES neighborhood far more programs than most other neighborhood. After sorting the values of program count, I found that Irving Park, Near West Side, Morgan Park, Loop, and Lincoln Square are the five neighborhood with top number of program count, with Irving Park neighborhood consistantly having most amount of programs each year than all other neighborhood.
I looked into other factors to deduce the reasons why these five neighborhood have most amount of programs. I found that 4/5 of the neighborhoods locate in or near chicago downtown area, so it could be that since people are more likely to come out to downtown, neighborhood in downtown areas assume that they are serving both people living in the area and those who might work/come visit the downtown area.
Then, I looked into the distribution of programs with equity features (Scholarship Available, has free food, participants paid) among three SES buckets over the years. As shown below, programs that offer scholarship are more distributed in high-SES neighborhood and programs that has free food are more distributed in low-SES neighborhood especially in 2023 and 2024. There’s not a clear pattern of how programs that pays participants are distributed, but high- and mid-SES neighborhoods have relatively more paid programs. Free food programs also increase among all three neighborhood SES buckets over time. To further explore factors that contribute to this general distributions, I decided to look into each equity features.
/var/folders/d0/lnkwf311481bj1nhh21x9k3m0000gn/T/ipykernel_3244/3596593404.py:13: FutureWarning: The default of observed=False is deprecated and will be changed to True in a future version of pandas. Pass observed=False to retain current behavior or observed=True to adopt the future default and silence this warning.
.groupby(['ses_bin', 'Start Year'])
/var/folders/d0/lnkwf311481bj1nhh21x9k3m0000gn/T/ipykernel_3244/3596593404.py:22: FutureWarning: The default of observed=False is deprecated and will be changed to True in a future version of pandas. Pass observed=False to retain current behavior or observed=True to adopt the future default and silence this warning.
.groupby(['ses_bin', 'Start Year'])
/var/folders/d0/lnkwf311481bj1nhh21x9k3m0000gn/T/ipykernel_3244/3596593404.py:31: FutureWarning: The default of observed=False is deprecated and will be changed to True in a future version of pandas. Pass observed=False to retain current behavior or observed=True to adopt the future default and silence this warning.
.groupby(['ses_bin', 'Start Year'])
Since academic programs are the one typically offers scholarship, my assumption for the trend that high-SES neighborhood has more scholarship programs is that there are more academic programs offered in high-SES neighborhood. To validate this assumption, I further seperate programs by their categories. However, the original dataset has too many categories and multiple categories belong to academic programs. Therefore, I grouped categories together into four buckets: Career & Life Skills, STEM & Writing, Arts & Humanity, Sports & Wellbeing. Among them, STEM & Writing and Arts & Humanity are academic programs. The reason why I group STEM and Writing together is because they are both considered critical in influencing one’s academic performance, especially for higher education. After visualizing in a clustered bar graph, I found that High-SES neighborhood, indeed, have more academic programs, both STEM & Writing and Arts & Humanity. Additionally, high-SES neighborhoods have more scholarship programs in every categories than mid-SES neighborhoods and low-SES neighborhoods, indicating a severe financial inequity among programs.
/var/folders/d0/lnkwf311481bj1nhh21x9k3m0000gn/T/ipykernel_3244/858058547.py:29: FutureWarning: The default of observed=False is deprecated and will be changed to True in a future version of pandas. Pass observed=False to retain current behavior or observed=True to adopt the future default and silence this warning.
ses_category_programcount = project_by_year_by_ses[project_by_year_by_ses['Scholarship Available']==True].loc[:,['Program ID','Category Group','ses_bin']].groupby(
While the general trend shows that low-SES neighborhood has more free food programs, I want to look more closer into each neighborhood, not just on the broad SES level. For example, for neighborhoods within low-SES buckets, is there a equal distribution of free food programs? On the neighhborhood level, can we still observe a positive correlation between neighborhood hardship index and number of free food programs offered. Therefore, I plotted a scatterplot with trendline with each neighborhood as a datapoint.
The result showcases a mild positive correlation between neighborhood hardship index and number of free food programs, meaning that neighborhood that has mroe hardship indeed has more free food programs. However, we can observe several extreme high values, indicating that these several neighborhood has far more free food programs than most others. So I sorted the dataset by number of free food programs offered and found that Austin, Brighton Park, Gage Park, South Lawndale, Near West Side are the five neighborhoods with most free food programs. I found that Austin has a large population than most neighborhood which could account for its relatively large amount of free food programs. Additionally, among these top five neighborhood, Near West Side is a high-SES neighborhood. Since it is near downtown, it makes sense to have more free food programs as downtown generally have more active population. However, many low-SES neighborhood are located at the south side of Chicago, making it logitically hard to get to downtown area like Near West Side than other neighborhoods near downtown (which typically are mid- and high-SES neighborhoods). Furthermore, considering some neighborhood with very high hardship level has very few free food programs (for example, Riverdale, a neighborhood that has the second highest hardship level only have 1 free food programs over the years), we can conclude that there is still shows a sizable food inequity despite the general trend of low-SES neighborhood having more free food programs.
For participants paid variable, since there is not a detectable differences among the three neighborhood SES buckets and I noticed that many programs’ participants paid variable has NaN values. Therefore, I want to try to improve the data quality. Specifically, I wonder if some programs, in fact, pay their participants, but show up as unpaid or NaN. To do so, I first investigate the word frequency in the descriptions of programs that pay participants. This is to identify key words in these paid programs that relate to financial support. Using re, nltk and counter libraries, I was able to remove stopwords and unreable parts, parse descriptions into words, and count the number of times each words appear. As shown in the graph below, among the top 20 most common words, “paid” and “stipend” are the two that are related to financial support.
[nltk_data] Downloading package stopwords to
[nltk_data] /Users/dxchannel/nltk_data...
[nltk_data] Package stopwords is already up-to-date!
[nltk_data] Downloading package punkt to /Users/dxchannel/nltk_data...
[nltk_data] Package punkt is already up-to-date!
[nltk_data] Downloading package punkt_tab to
[nltk_data] /Users/dxchannel/nltk_data...
[nltk_data] Package punkt_tab is already up-to-date!
Next, I decided to identify program descriptions that has either of these two key words “paid” and “stipend” but labelled as “unpaid” or NaN in “participants paid” column and labelled as false in “scholarship available” column. In this way, I hope to find programs that pay participants but did not show up as paid program or scholarship program.
To ensure the keywords accurately reflect that the program compensates its participants, I used re library to parse the descriptions into sentences and identify those containing the words “paid” or “stipend”. Upon reviewing the matching sentences, I found that all instances of “stipend” reliably indicated programs that pay their participants. However, the keyword “paid” introduced noise, such as mentions of “paid parking.” To reduce false positives, I chose to use “stipend” as the sole indicator of a paid program. I then added a new column, “Stipend”, and marked programs offering stipends as True.
Finally, I created a updated heatmap counting programs that either has “Paid, Type Unknown” value in “Participants Paid” column or True in “Stipend” column. Through a side by side comparsion of the previous heatmap and this updated version, we can observe that the number of programs that pay participants significantly increase in 2024 for low-SES neighborhoods. By comparing the two graph, we can also see that there are many programs that pay participants through stipend but is not labeled in the “pariticipants paid” column for low-SES neighborhoods but not that much for mid- and high-SES neighborhoods. This suggests that while efforts to provide financial supports for low-SES neighborhood have improved, these opportunities are not being adequately advertised.
/var/folders/d0/lnkwf311481bj1nhh21x9k3m0000gn/T/ipykernel_3244/1407071438.py:32: FutureWarning: The default of observed=False is deprecated and will be changed to True in a future version of pandas. Pass observed=False to retain current behavior or observed=True to adopt the future default and silence this warning.
.groupby(['ses_bin', 'Start Year'])
4.3 Analysis 3: How does average temperatures and the season(s) in which a program takes place influence the categories and costs of programs offered for different age group?
By Isabella Seo
For my EDA, I wanted to explore how temperature and season in which a program takes place influences the type (aka the Program Category) and the cost fo programs. This exploration would provide insight on when programs occurred throughout the year, and how the types programs offered for each age group are impacted.
In order to examine temperature, I loaded the Chicago temeprature dataset. This particular dataset has the average temperature of each month for each year from 2000 to 2024.
Because the year was not over when doing this analysis, I decided not to include data from 2024, meaning I had to filter the dataset to only include values from 2000-2023. I then used these remaining values to find the average of each month from 2000-2023. For example, the average temperature of January from all years between 2000-2023 was 24.975°F.
In regards to data cleaning and prepartation, the MCMF initiative stated that programs with a minimum age of 25 should be disregarded, as their goal is to focus on youth programs. They also stated that programs that included ages of 25 should be considered family programs.
Next, I realized that there was only a maximum age and minimum age available. For my exploration, I needed to know what programs counted for which ages, and a range of ages simply didn’t provide that information. At first, I tried using only the minimum and maximum ages for each program, however that did not fully capture all the ages that the program served. So, to overcome this issue, I create a range of ages for each program (using max-min age) and then expanded the range. The result was a DataFrame where each age for which a program was considered for was a new row. For example, if a program had an age range of 5-9, that program would be split into 5 different rows: one row with an age of 5, one row with an age of 6, etc.
I also had to do this for months, as there was only a start month and an end month in the original dataset. I created a range, and expanded it similarly to what was done for ages.
Then, I binned both ages and months. I used the bins 0-5, 6-10, 11-14, 15-18, and 19-25. I decided on these bins based on my own inutuition and knowledge of school age groups. I attempted to keep match the bins to different school ages, however they are not exact. For the purpose of this exploration, I decided not to include any programs with ages greater than 25, as this is no longer a “youth” program. For months, I grouped them into the four seasons of the year: Winter (Dec, Jan, and Feb), Spring (Mar, Apr, and May), Summer (Jun, Jul, and Aug), and Autumn (Sep, Oct, Nov).
In order to do analysis on temperature, I needed to map each month’s average onto the month in which that program occurs. However, in the MCMF dataset, months were referred to by their number (Jan = 1), while in the temperature dataset, months were reffered to by their abrevaition. I decided to change the MCMF dataset so that months be their abbreviations, and map the temeprature dataset onto the MCMF data.
Just to visualize, below is a graph of the number of programs in each month. Let it be noted that there are more programs in May, June, September, and October. These coencide with the end and beginning of school (respectively). Additionally, there are less programs in general around the winter season (December and January).
Text(0.5, 1.0, 'How Many Programs are in Each Month')
To look at the trends, I wanted to examine the relationship, if any, between ages and seasons/temperatures. Looking at the two heatmaps below, we can see highest concentration of programs for ages 19-25 in the Summer. There are generally just more programs for 19-25 year olds than any other age group. The second age group with the most about of programs for every season and temeprature range is 6-10 year olds Interestingly, there are also a lot of programs in the for 6-10 year olds in the warmer temperatures (66+°F), but this is not directly seen when looking at the summer season.
For the first part of my analysis, I wanted to look at the number of categories for each season and temperature range and for each age group. I started with modeling the seasons.
Text(0.5, 1.0, 'Category Count for each Season')
The graph above reinstates the trend that there are more activities overall in the summer. However, it also demonstrates that the number of these types of programs tends to stay consistent thoughout the year. For example, the most prevalent program type is always Sports & Wellness, followed by Music & Art. I found this interesting as I originally hypothesized that program popularity would flucate by season.
Next I explored these category trends in seasons by age groups.
<seaborn.axisgrid.FacetGrid at 0x17fafeed0>
As shown in the figure above, the category type frequencies seem to stay consistent throughout all the seasons. However, here we can see that there are certain activties that are more prevalent for different age groups. For example, there are mostly Reading and Writing programs directed towards ages 6-10 while there are mostly Digital Media activities directed towards ages 11-16. These are likely because those are the kinds of activities planners believe to be most relevant for that age group, however this is a lack of diversity.
Pivoting to look temperature, I binned the temperatures into 5 bins that I determined myself. We see a similar trend in which most programs occur in the warmer temperatures. Yet again, the same types of programs stay popular year round (See graph below)
It should be noted here that there is a dip in the number of observations for 56-65°F, which is likely due to how I chose to bin the temperatures. I attempted to make bins of equal sizes to remedy this, however that would not show differences in temperature ranges, therefore I must use bins of unequal sizes.
Splitting into age groups, we see the same trends as before. We also see that the more of the most popular activity type is in the warmer temperatures, coensiding with what we see in seasons plots.
Text(0, 0.5, 'Count (in millions)')
<seaborn.axisgrid.FacetGrid at 0x13bac6630>
Next, I wanted to see how program price would change with season and price. As seen in the graph below, there are more free programs in summer and autumn. I hypothesize that this is due to the start and end of classes/school. Interestingly, there is also an increase in the number of paid programs in the spring time. This is the only season in which there are more programs that are ‘$50 or Less’ than Free ones. I do not have a hypothesis as to why this is, however this suggests a need for more programs in the spring.
Text(0, 0.5, 'Count (in millions)')
Examining the figure above, we can see that there are overall more free programs and programs that are $50 or Less for 19-25 year olds. Additionally, we see that in winter and spring for all ages, there seems to be an increase in the number of paid programs in proportion to the number of programs offered. For 15-25 year olds, there is an increase in $50 or less programs while for 0-14 year olds, there is an increase in More than $50 programs. There also seems to be generally more of the paid programs for 6-14 year olds than any other age group. I theorize that this is because parents are expected to pay for younger children, while 19-25 year olds are thought to pay for themselves and to make it more accessible, there are more free and cheaper programs. However, to be more inclusive of lower income families, perhaps there should be more free programs for all ages and throughout all seasons.
<seaborn.axisgrid.FacetGrid at 0x5d39973e0>
Examining the figure above, we can see that there are overall more free programs and programs that are $50 or Less for 19-25 year olds. Additionally, we see that in winter and spring for all ages, there seems to be an increase in the number of paid programs in proportion to the number of programs offered. For 15-25 year olds, there is an increase in $50 or less programs while for 0-14 year olds, there is an increase in More than $50 programs. There also seems to be generally more of the paid programs for 6-14 year olds than any other age group. I theorize that this is because parents are expected to pay for younger children, while 19-25 year olds are thought to pay for themselves and to make it more accessible, there are more free and cheaper programs. However, to be more inclusive of lower income families, perhaps there should be more free programs for all ages and throughout all seasons.
Text(0, 0.5, 'Count (in millions)')
Turning to temperatures, we see a similar pattern of more free programs in the warmer temepratures (66+°F range). However, we can also observe an increase in $50 or Less programs in below freezing temperatures and the 46-65°F range. In the 33-45°F range though, there are more free programs, mirroring the 66+°F range more closely.
<seaborn.axisgrid.FacetGrid at 0x60c498fb0>
In the figure above, we can see that there are the same trends that we’ve been seeing in which colder temperatures have more expensive programs, especially for younger age groups. Interestingly though, there is a trend of temps between below freezing and 65°F having consistent program price distribution within each age group. For example, for 6-10 year olds, there is consistently more paid programs costing More Than $50 throughout all the temperatures. That is until 66 degrees and above, in which free programs are most prevalent for every age group. I cannot think of a reason as to why this is other than warmer temperatures typically signifying summer, and summer is usually equated to a break in school (as discussed before). So, more families and students would be interesting in activities to do, and to be inclusive of lower income families there are more free programs during that time.
Overall I found that programs are more frequent in the warmer temperatures, and the same programs tend to be offered year round for each age group. Additionally, there are differences in the price of programs offered both seasonally and by age groups. To further expand the MCMF initiative, I suggest including a wider variety of program categories for different age groups (EX: More digital media programs for older groups) and year round. There should also be a more even distribution of free programs across age groups for all seasons/temperatures. It should not be expected that younger children must pay more for programs.
Further exploration should look into possible reasons for these gaps in program availability seasonally/by temperatures. Perhpas there is a variable that I did not explore here that caused the lack of programs in spring, or increased its cost. Additionally, I theorized about the role of families and school in program cost and distirbution, however it is not definite and should be explored further.
4.4 Analysis 4: How does proximity to public transportation and meeting type affect assistance offered?
By Owen Handelman
Trying to address how location and meeting type affects assistance offered by a program, assitance must first be defined. I will be using transportation provided, has free food, participants paid, and scholarship available as the metrics of assitance. To answer the first question about how meeting type affects assistance provided we can compare online and face-to-face meetings on how everything other than transportation provided.
We can see that online programs are more likely to assist financialy than face-to-face programs, while face-to-face programs are more likely to have free food. Face-to-face programs being more likely to have free food is somewhat expected. It is important to note that the percentage/proportion of programs offering any kind of assistance is very low.
Text(0.5, 1.0, 'CTA Train Station Map')
Looking at the maps of CTA bus and train stations, it is evident that bus stations are much more uniformly spread out across the city while the train stations are concentrated on getting indviduals downtown. Given the difference in distribution it is reasonable to conclude that a program’s proximity to a bus station means something different than its proximity to a train station in terms of overall connectednes to the CTA and public transit in general.
Text(0.5, 1.0, 'Programs That Do Provided Transportation')
The maps of programs that do and do not offer transportation offer insight when looking at the locations of programs that provide transportation. Programs that do not provide transportation appear to be evently distributed accross Chicago while programs that do provide transportation appear to be concentrated in the downtown area or along primary routes connected to the downtown area.
Text(0.5, 1.0, 'Programs That Do Provided Transportation with CTA Train Stations')
Plotting the programs that do provide transportation with CTA train stations appears to confirm the above suspicion. With the notable exception of the program well west of Chicago, and the programs south of the train system, programs that provide transportation appear to be located rooughly along train lines which can be considered the major transportation routes (for people).
Calculating distance between all programs and their closest bus and train station using the haversine formula allows us to now analyze any relation between distance to public transport and assistance. Looking first at the above relationship between programs that provide transportation and their closest train line we see:
Text(0.5, 1.0, 'Programs that provide Transport')
The concentration of programs that provide transportation downtown or along major transportation routes can be seen in the difference in distribution of closest train line when comparing all programs to only those that provide transportation. Programs that provide transportation are more likely to be closer to the Green line than all programs, which makes sense as the green line runs through downtown Chicago (and West).
Moving on to differences in average distance to closest transportation station between programs that offer assitance and those that do not.
<Axes: xlabel='Scholarship Available', ylabel='Distance to Closest Bus Station (ft)'>
We can see that there does not appear to be a significant difference in average distance to nearest bus station when comparing programs that provide assistance and those that do not (only comparing programs based on one aspect of assistance). One possible explanation for this observation is that the apparent even distribution of the bus stations means looking at distance to nearest bus station is not important. Another possible explanation is that there are actual differences in distance to nearest bus station but the low sample size and high variability cause the margin of error to be too large to find this actual difference. Either way the proximity of all the programs that do not provide assistance to their nearest bus station indicates that this may be an area that is underutilized by programs, especially when it comes to providing transportation. It is clearly shown that programs that do not provide transportation are pretty close to their nearest bus station (~700ft from the graph).
Looking at distance to nearest train station we find:
<Axes: xlabel='Scholarship Available', ylabel='Distance to Closest Train Station (ft)'>
Only one piece of assitance has a significant difference, with programs that have free food being closer to their nearest train station than those that do not. Overall average distance to nearest train station for any type of program is around a mile.
Overall it can be seen that there appears to be an underutilization of the CTA public transportation system to encourage participation in programs. Most programs are well within walking distance to their closest bus stop.
5 Conclusions
Overall, we found there is a smaller amount of programs in neighborhoods of lower socioeconomic status. Furthermore, Low-SES neighborhoods generally have more access to free programs and food support, but they are underrepresented in terms of transportation assistance and scholarship availability, both of which are crucial for increasing participation. Mid-SES areas are particularly underserved, with notably fewer free programs, scholarships, and paid opportunities compared to other SES groups. These communities often find themselves caught in a gap, not receiving targeted support despite facing economic barriers. In contrast, High-SES areas have the greatest access to scholarships, transportation, and paid opportunities, which indicates a misalignment in resource distribution—favoring areas that are less economically vulnerable. These disparities point to inequitable access to resources that can hinder community engagement and growth, particularly in Low-SES and Mid-SES neighborhoods.
Analysis 2 reveals ongoing inequities in program availability and support across SES buckets, despite some positive trends. High-SES neighborhoods consistently have greater access to academic programs and scholarships. While low-SES neighborhoods generally have more free food programs, the uneven distribution within these areas—where neighborhoods with high hardship levels like Riverdale remain underserved—reveals persistent logistical and structural challenges. Similarly, the prevalent mislabeled or under-advertised stipend opportunities in low-SES neighborhoods highlights the importance of improving program transparency and marketing. By addressing these gaps, stakeholders can better align resources with community needs, ensuring that equity-focused efforts reach the populations that need them most.
Additionally, programs are more frequently available for youth in the warmer temperatures/the summer season, and the same programs tend to be offered year round for each age group. Those same programs also tend to be the most popular year round as well, indicating a lack of variation in opportunities across ages. Also, there are less programs during the winter and spring, and the cost of those programs are also typically higher than in the summer season. In general, there tends to be some differences in program availability based on age, specifically in terms of price. There are generally more cost friendly program choices for older youth (ages 19-25) than younger children, however it unknown as to why this is.
Also, access to programs could be more equitable if programs were encouraged to subsidize bus and maybe train fair to allow individuals greater access to programs, including program types that may not be offered in their area. Online programs may be better suited to allowing greater equity accross economic ability as they are more likely than face-to-face programs to offer financial assistance.
6 Recommendations to stakeholder(s)
To address disparities in program accessibility, we recommend targeted investments in communities with a high Hardship Index by introducing more programs and ensuring local government support, including mobile initiatives such as pop-up events and community outreach to reach those in the hardest-hit areas. Partnerships with NGOs and community-based organizations can also help bridge gaps in service delivery by bringing programs directly to these high-hardship communities. Expanding Sports and Wellness programs in Low-SES areas would also be beneficial, as these programs promote both physical and mental well-being. It is equally important to diversify program offerings in Low-SES neighborhoods, ensuring that residents have access to arts, educational, and skill-building programs that contribute to holistic development. We further suggest allocating more resources to increase the number of in-person programs in Low-SES areas, with a focus on outreach to make sure underprivileged communities are aware of available opportunities. Mid-SES neighborhoods would also benefit from increased availability of free programs, as well as greater access to scholarships and paid opportunities. These communities often fall into a gap—where they are not poor enough to qualify for targeted assistance but not affluent enough to afford available resources. To address these gaps, Low-SES neighborhoods need increased transportation support and needs-based scholarships. While these neighborhoods benefit from free food programs, a lack of transport assistance remains a significant barrier to participation. A more equitable redistribution of resources, such as scholarships and transportation support, across communities could help bridge these disparities. Implementing community outreach initiatives to raise awareness about scholarships, transportation, and paid opportunities would be beneficial, especially for Low-SES and Mid-SES neighborhoods, where lack of awareness might be contributing to reduced participation.
My Chi My Future initiative leadership should consider direct organizers in Irving Park, Near West Side, Morgan Park, and Lincoln Square to hold more programs in low-SES neighborhoods that are far away from downtown area, as these four neighborhood has the most amount of programs over the years. MCMF leadership should consider encourage more academic programs (both STEM & Writing and Arts & Humanity) in low-SES neighborhoods, and Program organizers should consider provide more scholarship options for low-SES neighborhoods or provide scholarship based on needs. MCMF leadership should also encourage more program organizers to provide free food options for low-SES neighborhoods. Finally, MCMF leadership should feature or promote programs that are hold in low-SES neighborhood and pay participants. Program organizers should put more effort in marketing if their programs pay participants by accurately inputing information in the “Participants Paid” column, not just in description. On the other hand, when collecting information, MCMF leadership should consider puting equity-focused features (scholarship available, has free food, participants paid) in front so that program organizers do not forget to fill them out. If possible, MCMF leadership should consider building a data screening tool that scans program descriptions and extract information related to equity-focus features (such as the amount of stipend provided).
To address the discrepencies in category and program price distribution through the year and different, we recommend creating a greater diversity of programs across age groups, not focusing a certain type of activity on each age. Additionally, it may be more effective to have different kinds of programs run throughout the year. We also recommend that the MCMF program create more free programs throughout all times of the year to allow for more equitable opportunities across Chicago youth. There should also be more free activities for all age groups, not just for older teenagers/young adults. It should not be expected that younger children must pay more for programs throughout the colder months.
We recommend enouraging more online programs to offer financial assistance as a way of enabling more economically-disadvantaged individuals to access programs increasing equity. This encouragement can be direct pressure to online programs but also the inclusion of more of them. Additionally, we recommend encouraging programs to subsidize CTA transit as a means of providing transportation assistance as most programs are within walking distance of a bus stop. Greater transporation assistance would help to reduce disparities in program and program type availability accross the different neighborhoods of Chicago.
References
[1] Bivand, R., Keitt, T., Rowlingson, B., and Hijmans, R.J. (2024). Shapely: Manipulation and Analysis of Geometric Objects in Simple Feature Standard. R package version 1.8.1. https://CRAN.R-project.org/package=shapely
[2] Pedregosa, F., Varoquaux, G., Gramfort, A., Michel, V., Thirion, B., Grisel, O., Blondel, M., Prettenhofer, P., Weiss, R., Dubourg, V., Vanderplas, J., Passos, A.D., Cournapeau, D., Brucher, M., Perrot, M., and Duchesnay, E. (2021). Scikit-learn: Machine Learning in Python. Journal of Machine Learning Research, 12, 2825-2830. http://scikit-learn.org
[3] Bird, S., Loper, E., and Klein, E. (2009). Natural Language Processing with Python. O’Reilly Media, Inc. https://www.nltk.org/
[4] Harris, C.R., Millman, K.J., van der Walt, S.J., Gommers, R., Virtanen, P., Cournapeau, D., Wieser, E., Taylor, J.D., Berg, S., Smith, N.J., Kern, R., Picus, M., Haldane, A., del Río, J., Wiebe, M., Peterson, P., Gérard-Marchant, P., Sheppard, K., Reddy, T., McDougal, T., and Harris, M. (2020). Array Programming with NumPy. Nature, 585(7825), 357-362. https://numpy.org/
[5] McKinney, W. (2010). Data Structures for Statistical Computing in Python. In S. van der Walt & J. Millman (Eds.), Proceedings of the 9th Python in Science Conference. https://pandas.pydata.org/
[6] Hunter, J.D. (2007). Matplotlib: A 2D Graphics Environment. Computing in Science & Engineering, 9(3), 90-95. https://matplotlib.org/
[7] Waskom, M. (2021). Seaborn: Statistical Data Visualization. Journal of Open Source Software, 6(60), 3021. https://seaborn.pydata.org/
[8] Muenchow, J., and other contributors. (2024). GeoPandas: Python for Geographic Data Analysis and Mapping. https://geopandas.org/
[9] Alkam, I. (2018). tabulate: Pretty-print tabular data in Python, version 0.8.10. https://pypi.org/project/tabulate/